[core] Delay chunk evaluation #2918

Alxiice · 2025-10-13T08:40:28Z

Description

This PR goal is to change chunk system to be able to work with graphs where nodes can have no specific size set.

Changes

Compute system

New levels have been added on the computation :
- extreme (that targets higher specs that what intensive does)
- script which is a mode for single-process simple process. It targets specific machines on the farm that have the ability to run multiple jobs in parallel so for simple jobs it should allow faster processing
Delayed chunks creation
- node
  - A _chunksCreated param has been added to check if the chunks have been correctly initialized
  - Now by default we use resetChunks when loading meshroom, that will create a list with a single chunk, and we can create chunks at any moment with _createChunks.
  - A new NodeStatusData has been added that tracks a nodeStatus file which is similar to the chunk statuses files but is specific to the node. This node status is used to get the cached range parametrization of the node so that we can retrieve it when possible with _createChunksFromCache.
- taskManager
  - The TaskThread and TaskManager have been modified to launch the chunk creation when the node compute starts

Submitters

Submitters have been moved to another package (TBD where...)
The BaseSubmitter API have been updated to allow more flexibility. A new BaseSubmittedJob exists and is used to track jobs that have been created. The goal is to use this as an interface to call actions on it (stop/pause/restart...)
A new bin/meshroom_createChunks script have been created, and handles the chunk creation and additional chunk tasks spooling
- Calls the chunk creation
- Checks if we can spool additioanl tasks
- If so, execute queue tasks that will compute the chunks
- If not, execute the chunks serially on the current process

Changes to the tractor API have been implemented here meshroomHub/mrSubmitters#1

Examples

codecov · 2025-10-13T08:41:21Z

Codecov Report

❌ Patch coverage is 51.71849% with 295 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.95%. Comparing base (6d03825) to head (b25aeb7).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
meshroom/core/node.py	55.80%	179 Missing ⚠️
meshroom/core/submitter.py	40.71%	83 Missing ⚠️
meshroom/core/graph.py	44.82%	16 Missing ⚠️
meshroom/core/desc/node.py	40.00%	15 Missing ⚠️
meshroom/core/__init__.py	0.00%	1 Missing ⚠️
meshroom/core/desc/computation.py	83.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2918      +/-   ##
===========================================
- Coverage    80.82%   78.95%   -1.88%     
===========================================
  Files           59       59              
  Lines         7844     8301     +457     
===========================================
+ Hits          6340     6554     +214     
- Misses        1504     1747     +243

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cbentejac

Additional notes:

Let's say I compute a graph (for example, the photogrammetry template) in a given location. All my nodes have been successfully computed, the statuses are all set to "SUCCESS". Now let's say I do a "save as" on the graph I just computed and save it to a new location. The cache changes, so we expect all the nodes' status to be reset. For nodes that have dynamic chunks, the display of the status is not refreshed, and we end up with nodes that appear as computed, although they have lost all their status files.
Below is a screenshot of the photogrammetry graph that was computed and saved in a new location (the first branch was computed, the second was added from the template for the sake of comparison):

There are some unpredictable behaviours when performing actions that are allowed but probably shouldn't be. An example would be clicking on "stop task" for a task that is already done computing and is in "SUCCESS" state, while nodes are being computed later on in the graph. The UI allows to perform it, there is an info message saying that the task has been successfully stopped, and the node's status is updated to "STOPPED" in the task manager and Graph Editor; it does not enable the "resume job" button. This seems to cause the job on the farm to finish computing the current task and then pausing the rest of the job (to be verified).
I have noticed on several occasions that when clicking on buttons from the "JOB" tab, the info message that is sent contains the name of a node that is not part of the job at all (if all my submitted tasks are for NodeName_2, I may get a message about NodeName_1, which is not being computed at all).

meshroom/core/node.py

bin/meshroom_createChunks

meshroom/core/node.py

meshroom/ui/qml/GraphEditor/TaskManager.qml

cbentejac

Additional notes:

Let's say I compute a graph (for example, the photogrammetry template) in a given location. All my nodes have been successfully computed, the statuses are all set to "SUCCESS". Now let's say I do a "save as" on the graph I just computed and save it to a new location. The cache changes, so we expect all the nodes' status to be reset. For nodes that have dynamic chunks, the display of the status is not refreshed, and we end up with nodes that appear as computed, although they have lost all their status files.
Below is a screenshot of the photogrammetry graph that was computed and saved in a new location (the first branch was computed, the second was added from the template for the sake of comparison):

There are some unpredictable behaviours when performing actions that are allowed but probably shouldn't be. An example would be clicking on "stop task" for a task that is already done computing and is in "SUCCESS" state, while nodes are being computed later on in the graph. The UI allows to perform it, there is an info message saying that the task has been successfully stopped, and the node's status is updated to "STOPPED" in the task manager and Graph Editor; it does not enable the "resume job" button. This seems to cause the job on the farm to finish computing the current task and then pausing the rest of the job (to be verified).
I have noticed on several occasions that when clicking on buttons from the "JOB" tab, the info message that is sent contains the name of a node that is not part of the job at all (if all my submitted tasks are for NodeName_2, I may get a message about NodeName_1, which is not being computed at all).

…n, create a NodeStatusData

…n fetch licenses required for the submitter

`nodeType` and `packageName` are already part of the node's status file, and there is thus no need to propagate them to the chunks'.

…Blocks in status files

`canBeStopped` and `canBeCanceled` only took into account nodes that were being run locally. As nodes that have been submitted externally (exclusively on a render farm, not in another instance of Meshroom) can now be stopped and canceled as well, they are considered in both methods.

When a job has been submitted, the state of the `Submit` button switches to `stoppable`. Clicking it in this state interrupts the node's computation on the farm.

… state

…chunk

`chunkPlaceholder` is a list model that is meant to contain a placeholder chunk for nodes that have dynamic chunks: prior to the chunks' creation, it is used to reflect the state of the node as a single chunk. In particular, this allows to reflect accurately the node's status while it is submitted but the chunks have not been created yet. The property is a list model as it allows it to be used as a model later on on the QML side.

`onGraphUpdated` updates the models for chunks, including the placeholder, which allows to have an up-to-date status for all the existing chunks as well as the placeholder one when a node's number of chunks hasn't been determined yet.

Alxiice self-assigned this Oct 13, 2025

Alxiice added the feature new feature (proposed as PR or issue planned by dev) label Oct 13, 2025

Alxiice added this to the Meshroom 2026.1.0 milestone Oct 13, 2025

Alxiice force-pushed the dev/delayChunkEvaluation branch 2 times, most recently from 02bae2f to 15df47f Compare October 21, 2025 09:56

Alxiice changed the base branch from develop to dev/remove_submitters October 21, 2025 09:56

Alxiice requested review from cbentejac and fabiencastan October 28, 2025 08:45

cbentejac force-pushed the dev/remove_submitters branch from 4b47850 to 51cfd91 Compare November 4, 2025 14:10

Base automatically changed from dev/remove_submitters to develop November 4, 2025 16:24

cbentejac force-pushed the dev/delayChunkEvaluation branch from 15b7db5 to ecbcbed Compare November 4, 2025 16:36

cbentejac requested changes Nov 5, 2025

View reviewed changes

cbentejac force-pushed the dev/delayChunkEvaluation branch 4 times, most recently from 8a599b4 to e720c8b Compare November 24, 2025 17:04

cbentejac force-pushed the dev/delayChunkEvaluation branch 2 times, most recently from b35b0e7 to 96e9acb Compare November 26, 2025 13:47

Alxiice added 10 commits November 26, 2025 15:35

[core] Update submitter API

2eca1dc

[core] node/taskManager: create _chunksCreated to delay chunk creatio…

b7c9594

…n, create a NodeStatusData

[node] Add licenses list on node desc to provide a source where we ca…

90df8e2

…n fetch licenses required for the submitter

[core] computation : update computation levels

1d64ca9

[bin] Add createChunks script

3ad4182

[submitter] Fix SubmitterOptionsEnum.ALL mode on py 3.9

577f4c3

[qml] Fix anchor issue when chunks are emptied

5ed11ef

[core] Node : add defaultStatus in _createChunks

6a459d5

[core] Start updating taskmanager and submitter for new chunk process

254c106

[core] First implementation to kill submitted tasks

0dda5c1

Alxiice and others added 10 commits November 26, 2025 15:41

[core] submitter : retrieve job on node update + fix some ui issues

ac02b20

[chunks] Apply typos/cleaning suggestions from @cbentejac

747e8e8

[submitter] Add tools to avoid autoretry on farm

527cff5

[submitter] Fix interruptJob UI updates

bf666e7

[bin] Update permissions on meshroom_createChunks

7770d5c

[core] node: Correctly use custom size for non-parallelized nodes

9492acb

[bin] Fix typo: Replace occurrences of "infos" with "info"

445cab2

[core] Fix typo: Replace all occurrences of "infos" with "info"

5aca21f

[core] Linting: Remove all trailing whitespaces

d12e434

[core] node: Remove references to packageVersion in NodeStatusData

c169a1d

cbentejac force-pushed the dev/delayChunkEvaluation branch 2 times, most recently from de92eca to bc6ef4a Compare November 27, 2025 13:52

cbentejac added 6 commits December 1, 2025 17:19

[core] node: Remove static info from the chunks' status file

02f50d9

`nodeType` and `packageName` are already part of the node's status file, and there is thus no need to propagate them to the chunks'.

Linting: Remove trailing whitespaces

af397c2

[core] node: Use explicit keys for chunks' blockSize, fullSize and nb…

ced2c05

…Blocks in status files

[ui] Add stoppable state for the Submit button

67a67b0

When a job has been submitted, the state of the `Submit` button switches to `stoppable`. Clicking it in this state interrupts the node's computation on the farm.

[ui] NodeActions: Add Retry button for submitted tasks on error state

4f96f9d

cbentejac force-pushed the dev/delayChunkEvaluation branch from dede3a7 to 4f96f9d Compare December 1, 2025 16:19

cbentejac added 9 commits December 1, 2025 17:52

[GraphEditor] NodeChunks: Remove specific color for dynamic chunks

5c468f4

[ui] NodeActions: Fix status of compute and submit in deletable…

1222ec5

… state

[GraphEditor] Add "Retry Error Tasks" menu to match NodeActions

c22e40f

[GraphEditor] Add "Interrupt/Cancel Job" menus

b2d9ab6

[GraphEditor] NodeChunks: Don't add specific display when there's no …

60074bd

…chunk

[ui] Use chunk placeholder for uncreated dynamic chunks

ccbc890

[ui] Application: Update state of the global "Submit" icon when needed

22ae573

cbentejac force-pushed the dev/delayChunkEvaluation branch from b371eef to 22ae573 Compare December 3, 2025 15:46

.git-blame-ignore-revs: Add linting commits

b25aeb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[core] Delay chunk evaluation #2918

[core] Delay chunk evaluation #2918

Alxiice commented Oct 13, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 13, 2025 •

edited

Loading

Uh oh!

cbentejac left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cbentejac left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[core] Delay chunk evaluation #2918

Are you sure you want to change the base?

[core] Delay chunk evaluation #2918

Conversation

Alxiice commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Examples

Uh oh!

codecov bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cbentejac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cbentejac left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Alxiice commented Oct 13, 2025 •

edited

Loading

codecov bot commented Oct 13, 2025 •

edited

Loading